82 results found.
Written
Corpus,
Language Type:
Trilingual
Languages:
English Finnish Turkish
Availability:
Freely Available
License:
<Not Specified>
Size:
millions sentencesProduction Status:
Existing-used
Use:
Language Modelling
-
Paper title:Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology
-
Paper track:Morphology, word segmentation, tagging and chunking
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Stig-Arne Grönroos | Aalto University, Department of Signal Processing and Acoustics | FI |
| Author 2 | Sami Virpioja | Aalto University | FI |
| Author 3 | Peter Smit | Aalto University | None |
| Author 4 | Mikko Kurimo | Aalto University | FI |
| Main Contact | Stig-Arne Grönroos | Aalto University, Department of Signal Processing and Acoustics | None |
Documentation:
<Not Specified>Language Type:
Trilingual
Languages:
Finnish Russian Turkish
Availability:
Freely Available
License:
Gnu
Size:
102 KByte Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:A Novel Evaluation Method for Morphological Segmentation
-
Paper track:Evaluation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Javad Nouri | University of Helsinki | FI |
| Author 2 | Roman Yangarber | University of Helsinki | FI |
| Main Contact | Javad Nouri | University of Helsinki | None |
Documentation:
Documentation available in English
Written
Corpus,
Language Type:
Trilingual
Languages:
Finnish German Spanish
Availability:
Freely Available
License:
Creative Commons Attribution-ShareAlike 3.0 Unported
Size:
1828665 entries Production Status:
Existing-used
Use:
Morphological Analysis
-
Paper title:Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks
-
Paper track:Applications
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Carsten Schnober | UKP Lab, TU Darmstadt | DE | ||
| Author 2 | Steffen Eger | UKP Lab, TU Darmstadt | N/A | ||
| Author 3 | Erik-Lân Do Dinh | UKP Lab, Technische Universität Darmstadt | DE | ||
| Author 4 | Iryna Gurevych | UKP Lab, Technische Universität Darmstadt | DE | Ubiquitous Knowledge Processing (UKP) Lab | DE |
| Main Contact | Carsten Schnober | UKP Lab, TU Darmstadt | None |
Documentation:
English, included in datasetLanguage Type:
Trilingual
Languages:
Finland-Swedish Sign Language Finnish Finnish Sign Language
Availability:
Not Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Newly created-in progress
Use:
Linguistic Research
-
Paper title:Taking non-manuality into account in collecting and analyzing Finnish Sign Language video data
-
Paper track:Poster with Demo
-
Paper status:Accept as poster with demo
| Author Number | Name | Affiliation | Country | ||||
|---|---|---|---|---|---|---|---|
| Author 1 | Anna Puupponen | University of Jyväskylä | FI | ||||
| Author 2 | Tommi Jantunen | <Not Specified> | None | University of Jyväskylä | FI | University of Jyväskylä | None |
| Author 3 | Ritva Takkinen | University of Jyväskylä | None | ||||
| Author 4 | Tuija Wainio | University of Jyväskylä | None | ||||
| Author 5 | Outi Pippuri | University of Jyväskylä | None | ||||
| Main Contact | Anna Puupponen | University of Jyväskylä | None |
Documentation:
<Not Specified>Language Type:
Trilingual
Languages:
Czech English Finnish
Availability:
Freely Available
License:
<Not Specified>
Size:
6K sentences Production Status:
Newly created-in progress
Use:
Parsing and Tagging
-
Paper title:Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions
-
Paper track:Evaluation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Kira Droganova | Institute of Formal and Applied Linguistics (ÚFAL MFF UK), Faculty of Mathematics and Physics, Charles University | CZ |
| Author 2 | Daniel Zeman | Charles University, Faculty of Mathematics and Physics | CZ |
| Author 3 | Jenna Kanerva | TurkuNLP Group, University of Turku | FI |
| Author 4 | Filip Ginter | TurkuNLP Group, University of Turku | FI |
| Main Contact | Kira Droganova | Institute of Formal and Applied Linguistics (ÚFAL MFF UK), Faculty of Mathematics and Physics, Charles University | None |
Documentation:
<Not Specified>
Written
Corpus,
Language Type:
Monolingual
Languages:
Finnish
Availability:
Freely Available
License:
Size:
None Production Status:
Use:
Language Modelling
-
Paper title:FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics
-
Paper track:12.19 Other topics in Spoken Language Processing: /Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Katri Leino | OpenSubtitles | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Finnish
Availability:
From Owner
License:
CC - BY - NC
Size:
4132665850 tokens Production Status:
Existing-used
Use:
Language Modelling
-
Paper title:FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics
-
Paper track:12.19 Other topics in Spoken Language Processing: /Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Katri Leino | The Suomi 24 Sentences Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Finnish
Availability:
Freely Available
License:
Size:
22210 words Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics
-
Paper track:12.19 Other topics in Spoken Language Processing: /Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Katri Leino | FinChat | /N |
Documentation:
English
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese Dutch Finnish French German Greek Hungarian Japanese Russian Spanish
Availability:
Freely Available
License:
Apache-2.0
Size:
None Production Status:
Existing-used
Use:
Speech Synthesis
-
Paper title:One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
-
Paper track:7.14 Cross-lingual and multilingual aspects in spe/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Tomáš Nekvinda | CSS10 | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Dari/Pashto Dutch English Finnish French Hindi Icelandic Indonesian Japanese Lithuanian Malay Mandarin Nepali Portuguese Punjabi Romanian Slovenian Spanish
Availability:
From Owner
License:
CreativeCommons
Size:
467 hours Production Status:
Newly created-finished
Use:
Person Identification
-
Paper title:JukeBox: A Multilingual Singer Recognition Dataset
-
Paper track:4.3 Speaker verification and identification/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Anurag Chowdhury | JukeBox | /N |
Documentation:
Documentation in English language will be made available upon publication of the dataset.




